Introduction

Background, what’s out there (visualization tools,) why this is useful (because there are not that many detailed examples showing the code, talk about your experience in Sunbelt “what’s the format of the data,” look for papers talking about computing literacy) and our goal (start to finish network visualization: load the data, process it a little bit, and plot it).

Network visualization in a nutshell: Things to consider

One of the important aspects of network visualization is the layout algorithm used to actually graph the networks. There are a number of popular graphing algorithms, and each of these have their own strengths and weaknesses. The “Circle” algorithm places clusters on a circle and creates straight lines between vertices (Six and Tollis 1999). It works well for showing bi-connectivity and subnetworks, and the number of edges needs to be relatively low to effectively show connections (Six and Tollis 1999). The DrL (Distributed Recursive Layout) employs a multilevel force algorithm that is based on simulated annealing, and it works well for large abstract datasets (Martin, Brown, and Wylie 2007). The Fruchterman-Reingold layout algorithm uses vertices and edges as atomic bodies with repulsive and attractive forces to minimize the energy of the system (Fruchterman and Reingold 1991) (Hansen, Shneiderman, and Smith 2011), which benefits large social networks. The Kamada Kawai layout uses a spring algorithm to layout undirected graphs in a symmetric drawing with a minimum number of edge crossings, which works well for network structures (Kamada and Kawai 1989). The LGL (Large-Graph Layout) is based on a mass-spring algorithm, using edges as springs to pull vertices together while a repulsive force prevents overlapping, making it highly effective for dense biological networks (Adai et al. 2004).

Another aspect that must be considered is the graphing parameters. One aspect of graphing is vertex size, which can effectively convey varying weights and changes in information, as demonstrated by Zien, Schlag, and Chan (1999) and the study of Sharma and Chou (2022) where vertex size is determined by the number of outgoing edges. Additionally, vertex size can be employed to illustrate increases or decreases in data, such as counts, as observed by Knisley and Knisley (2014). The color of vertices is equally significant, not only for enhancing visual appeal but also for aiding in the differentiation of objects or levels (Ognyanova, n.d.). Furthermore, vertex color can assist in visualizing groupings, patterns, or clusters (Tyner, Briatte, and Hofmann 2017). Just like vertex color, the shape of vertices contributes to both aesthetic appeal and data distinction, and Grapov and Newman (2012) exemplify how a combination of vertex shape, size, and color can effectively differentiate different data points. Finally, the width of edges plays a vital role in displaying the strength of connections between vertices, allowing users to comprehend the varying degrees of connection intensity. Lin (2018) provides an example where edge widths are proportional to other measured aspects in the study. By thoughtfully considering these diverse components and utilizing them skillfully, network visualization can become a powerful tool for users to better understand intricate relationships within their data.

The type of data needs to be taken into consideration as well. Egocentric data encompasses diverse types of social network measurements, including degrees of mesh (Barnes 1954), level of knittedness (Bott 2002), local or cosmopolitan orientation (Merton 1968), strength of ties (Granovetter 1973), family or friend networks (Wellman 1979), and more. These measurements revolve around the social relationships surrounding a central individual’s immediate context, offering insights into their social status and the flow of information, support, or resources (Marsden and Hollstein 2023). Examples of its applications range from health-related topics (Burgette et al. 2021) and social behaviors (Carrasco, 2008) to ecological data (Mascareno, 2020) and beyond. Additionally, network analysis involves small networks that exhibit high clustering and short characteristic path lengths, such as those found in medical aspects like brain networks (Bassett, 2006), location aspects like electric power grids or airport connections (Amaral, 2000), and social connections (Newman, 2000), among others. On the other end of the spectrum, large networks comprise billions of nodes and edges, capturing connections within a community and include examples like social media platforms, mobile phone networks, and website connections (Blondel, 2008). Furthermore, bipartite networks, which model relationships between two distinct sets of entities, find applications in various fields, including microbiology topics (Corel, 2018), plant-animal mutualistic networks (Jordano), and artistic collaboration networks (Uzzi, 2005), among others (Banerjee, 2017). Understanding these different types of data and their applications provides valuable insights into the complexities of interconnected systems.

Start to finish examples

In this section, we will present two full-length examples of network visualization. In both, we will start with raw data sets, walking through how to read and process the data and how to build a visualization step by step. Throughout the paper, we will use the igraph (Csárdi et al. 2023) (Csardi and Nepusz 2006), data.table (Dowle and Srinivasan 2023), and netplot (Vega Yon and Bischoff 2023) R packages. We start by loading those packages

library(igraph)
library(data.table)
library(devtools)
#install_github("USCCANA/netplot")
library(netplot)

Example 1: School data

For the first example, we will use a data set from the paper titled “Estimates of Social Contact in a Middle School Based on Self-Report and Wireless Sensor Data” by Leecaster et al., which features the social networks of 7th and 8th-grade students. We have identifiers such as gender, lunch period, and grade, which we will use for building our visualization.

Cleaning Data

First, the data needs to be pulled in. After we pull it in, let’s glimpse what the data looks like.

# loading and cleaning data
students      <- fread("./data/middle_school/pone.0153690.s001.csv")
interactions  <- fread("./data/middle_school/pone.0153690.s003.csv")

head(students)
##      id grade gender unique lunch initialsNum
## 1: 2003     7      0      0     1         386
## 2: 2004     8      1      1     1         402
## 3: 2006     7      1      1     2         288
## 4: 2008     8      0      1     1         199
## 5: 2009     7      1      0     1         147
## 6: 2010     8      1      0     1         157
head(interactions)
##      id contactGender contactGrade contactId ClassPeriod contactInitialNum
## 1: 2004             1            8      3127           4               323
## 2: 2004             0            8      2620           1               335
## 3: 2004             1            8        99           1               401
## 4: 2004             1            8        99           9               401
## 5: 2004             1            8        99           9               401
## 6: 2004             1            8        99           9               401

In order to use the data, we need to remove all of the ’N/A’s and miscoding in the datasets. Also, we see a large number of students who only have interactions with themselves (they do not interact with anyone else through the day), so these “isolates” need to be removed in order for the graph to be more easily read.

# filtering out 'N/A's in the 'students' data frame
students       <- students[!is.na(id)]

# filtering down to gender being "0" or "1"
students       <- students[gender %in% c("0", "1")]

# filter out 'N/A's in 'id' and 'contactId' 
interactions   <- interactions[!is.na(id) & !is.na(contactId)]

# Which connections are not OK?
ids            <- sort(unique(students$id))

# narrowed our data from 10781 to 5150
interactions   <- interactions[(id %in% ids) & (contactId %in% ids)]

source(file = "./misc/color_nodes_function.R")

After, the two datasets need to be combined together.

## Creating matrix from datasets
net                   <- graph_from_data_frame(
                          d = interactions[, .(id, contactId)],
                          directed = FALSE, vertices = as.data.frame(students)
                        )

## Getting only connected individuals
net_with_no_isolates  <- induced_subgraph(net, which(degree(net) > 0))

Finally, we plot it, effectively showing this network graph.

## Plot with no isolates
set.seed(3)
nplot(
  net_with_no_isolates
) 

Vertex Options

Here, we are taking the data set and the plot, letting us customize a number of aspects of the graph. First, in order to work with the “color_nodes” function, we need to make “grade” a factor instead of being numeric. Also, we identify the colors we would like the nodes to be.

  ## adjust 'grade' to factor 
V(net_with_no_isolates)$grade <-  as.factor(V(net_with_no_isolates)$grade)  

# plotting connections among grades ####
set.seed(3) 

a_colors                      <- color_nodes(net_with_no_isolates,"grade", c("gray40","red3"))
attr(a_colors, "map")
##         7         8 
## "#666666" "#CD0000"

Now, we are able to create a plot of the data. This is the same data that we used to create the plot above, but now adjustments to the nodes will be made.

  • Color the vertices (‘vertex.color’) according to the grade the student is in (with 7th graders being gray and 8th graders being red).

  • Adjust the shape of the vertices (‘vertex.nsides’). If the student is a 7th grader, the vertices will be a circle, but if they are not, the vertices will be a triangle.

  • Adjust size of vertices (‘vertex.size.range’).

  • Remove the labels of the nodes.

set.seed(3)
grades   <- nplot(
             net_with_no_isolates,
             vertex.color         = color_nodes(net_with_no_isolates, "grade", c("gray40","red3")),
             vertex.nsides        = ifelse(V(net_with_no_isolates)$grade == 7, 10, 3),
             vertex.size.range    = c(0.015, 0.020),
             vertex.label         = NULL)

print(grades)

This looks good, but lets alternate these parameters we just gave to make things have a different look.

  • Change vertex.colors to be tied to a color palette.

  • Adjust vertex.nsides to make 7th graders be an octagon and 8th graders be a hexagon.

  • Adjust vertex.size.range, making each vertex smaller.

  • Add and adjust labels of vertices with functions vertex.label.[specific_function]

    • vertex.label.fontsize adjust the font size

    • vertex.label.show adjusts proportion of labels to keep.

  • Adjust vertex.frame.color to give an outline of each vertex.

library(igraph)
library(RColorBrewer)

# Create a color palette using RColorBrewer
palette <- brewer.pal(3, "Set1")  # Change the number and palette name as needed

set.seed(3)
grades <- nplot(
  net_with_no_isolates,
  vertex.color           = color_nodes(net_with_no_isolates, "grade", palette),
  vertex.nsides          = ifelse(V(net_with_no_isolates)$grade == 7, 8, 6),
  vertex.size.range      = c(0.01, 0.011),
  vertex.label.fontsize  = 10,
  vertex.label.show      = .25,
  vertex.frame.color     = "black")

print(grades)

Edge Options

Now that we have explored a bit about vertices, let’s dive into options related to edges.

  • Change edge.width.range to make the size of the edges wider or thinner.

  • Change edge.color to blue.

  • Change edge.color.alpha to adjust transparency.

set.seed(3)
grades <- nplot(
  net_with_no_isolates,
  vertex.label=NULL,
  edge.width.range = c(.25,1),
  edge.color = "dodgerblue4",
  edge.color.alpha = .33)

print(grades)

Now, let’s adjust everything again, showing some of the things that netplot can do with edges.

  • Adjust edge.color so that edges correspond to vertices on a gradient.

  • Adjust edge.curvature to make edges a straight line.

  • Adjust edge.line.lty to make edges long dashes.

set.seed(3)
grades <- nplot(
  net_with_no_isolates,
  vertex.label=NULL,
  edge.width.range = c(1,1),
  vertex.color = color_nodes(net_with_no_isolates, "grade", c("blue","red3")),
  edge.color = ~ego(alpha = 0.5) + alter(alpha = 0.5),
  edge.curvature = 0,
  edge.line.lty = 5)
  
  
print(grades)

Other Options

Using the same plot that we originally created, we can also adjust some of the aspects outside of vertices and edges.

  • Adjust bg.col to make background color slate gray.

  • Adjust sample.edges to select a proportion of the edges.

## Plot with no isolates
set.seed(3)
nplot(
  net_with_no_isolates,
  vertex.label=NULL,
  bg.col = "slategray1",
  sample.edges = .5) 

We can adjust things to get a different outcome.

  • Adjust skip.edges to remove edges altogether.

  • Adjust bg.col to misty rose.

  • Adjust zero.margins to true.

## Plot with no isolates
set.seed(3)
nplot(
  net_with_no_isolates,
  vertex.label=NULL,
  skip.edges = TRUE,
  bg.col = "mistyrose",
  zero.margins = TRUE
  ) 

Conclusion

The middle school data set provides a basis where we can see what netplot can do. There are options to adjust the vertices, edges, and even other parameters.

Example 2: Healthcare data

This data set comes from “Assessing Pathogen Transmission Opportunities: Variation in Nursing Home Staff-Resident Interactions” by Chang et. al. It explores connections in a number of nursing homes across 7 states between patients and healthcare providers. There are 99 networks in the data set.

With this data, we will explore how multiple smaller networks can work together to tell a story and can be plotted using netplot.

Cleaning Data

First, the data needs to be loaded in, with the requisite packages we will be using.

# attaching packages
library(network)

data <- load("./data/nursing_home/network99_f1.RData")  

First Plot

Following, we are now ready to plot the data, as it is already in the correct, cleaned format. First, let’s pull the first and the second networks alone so we can have a closer look at them.

# Creates an empty list to store the networks
nets <- list()

# Sets a seed for reproducibility
set.seed(1231)

for (i in 1:2) {  # Change the loop range to 1:2
  # Checks if the vertex "is_actor" exists in the network
  is_health_care_provider <- networks[[i]] %v% "is_actor"
  nets[[i]] <- nplot(
    networks[[i]],
    # Colors the vertices gray if HCP exists, red otherwise
    vertex.color = ifelse(is_health_care_provider, "gray40", "red3"),
    # Makes vertices square if HCP exists, round otherwise
    vertex.nsides = ifelse(is_health_care_provider, 4, 10),
    # Makes HCP vertices larger than patient vertices
    vertex.size = ifelse(is_health_care_provider, .25, .15),
    vertex.size.range = c(.015, .065),
    edge.width.range = c(.25, .5),
    # Sets edge line breaks to 1 and colors edges black
    edge.line.breaks = 1,
    edge.color = ~ego(alpha = 1, col = "lightgray") + alter(alpha = 1, col = "lightgray"),
    edge.curvature = pi / 6,
    # Removes vertex labels
    vertex.label = NULL
  )
}

# Combines the 2 plots into a 1x2 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow = 1, ncol = 2)

Here, the healthcare provider is represented by gray diamonds, while the patients are represented by red circles.

Second Plot

Much like the previous example, we can use the different aspects of netplot to adjust how the graph looks.

  • Adjust vertex.color so providers are purple instead of gray and patients are pink instead of red.

  • Adjust vertex.nsides so providers are triangles and patients are hexagons.

  • Adjust edge.line.breaksto make the edges curved instead of straight.

  • Adjust edge.color so edges are now black instead of gray.

    • Adjust alpha so the black is slightly transparent.
  • Adjust edge.curvature to make the edges more curved.

# Creates an empty list to store the networks
nets <- list()

# Sets a seed for reproducibility
set.seed(1231)

for (i in 1:2) {  # Change the loop range to 1:2
  # Checks if the vertex "is_actor" exists in the network
  is_health_care_provider <- networks[[i]] %v% "is_actor"
  nets[[i]] <- nplot(
    networks[[i]],
    # Colors the vertices gray if HCP exists, red otherwise
    vertex.color = ifelse(is_health_care_provider, "purple", "pink"),
    # Makes vertices square if HCP exists, round otherwise
    vertex.nsides = ifelse(is_health_care_provider == TRUE, 3, 6),
    # Makes HCP vertices larger than patient vertices
    vertex.size = ifelse(is_health_care_provider == TRUE, .25, .15),
    vertex.size.range = c(.015, .065),
    edge.width.range = c(.25, .5),
    # Sets edge line breaks to 1 and colors edges black
    edge.line.breaks = 6,
    edge.color = ~ego(alpha = .8, col = "black") + alter(alpha = .8, col = "black"),
    edge.curvature = pi / 3,
    # Removes vertex labels
    vertex.label = NULL
  )
}

# Combines the 2 plots into a 1x2 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow = 1, ncol = 2)

99 Plots

Now that we understand what these networks look like at a closer level, we can plot them all for comparison.

# Creates an empty list to store the networks
nets <- list()  

# Sets a seed for reproducibility 
set.seed(1231)  

for (i in 1:99) { 
  # Checks if the vertex "is_actor" exists in the network
  is_health_care_provider <- networks[[i]] %v% "is_actor"  
  nets[[i]] <- nplot( networks[[i]], 
                      # Colors the vertices gray if HCP exists, red otherwise
                      vertex.color     = ifelse(is_health_care_provider, "gray40", "red3"),  
                      # Makes vertices square if HCP exists, round otherwise
                      vertex.nsides    = ifelse(is_health_care_provider == TRUE, 4, 10),
                      # Makes HCP vertices larger than patient vertices
                      vertex.size      = ifelse(is_health_care_provider == TRUE, .25, .15),
                      vertex.size.range = c(.015,.065),
                      edge.width.range = c(.25,.5),
                      # Sets edge line breaks to 1 and colors edges black  
                      edge.line.breaks = 1, edge.color = ~ ego(alpha = 1, col = "lightgray") +    alter(alpha = 1, col = "lightgray"),
                      edge.curvature = pi/6,
                      # Removes vertex labels
                      vertex.label     = NULL )
}

# Combines the 99 plots into an 11x9 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow=11, ncol=9) 

Conclusion

As made evident, netplot can be used in a “For-Loop,” creating a large number of graphs with large amounts of data in a very quick manner. Having these graphs side-by-side allows for quick and easy analysis of the similarities and differences.

Adai, Alex T., Shailesh V. Date, Shannon Wieland, and Edward M. Marcotte. 2004. LGL: Creating a Map of Protein Function with an Algorithm for Visualizing Very Large Biological Networks.” Journal of Molecular Biology 340 (1): 179–90. https://doi.org/10.1016/j.jmb.2004.04.047.
Barnes, J. A. 1954. “Class and Committees in a Norwegian Island Parish.” Human Relations 7 (1): 39–58. https://doi.org/10.1177/001872675400700102.
Bott, Elizabeth. 2002. “Conjugal Roles and Social Networks.” In Family and Social Network. Routledge.
Burgette, Jacqueline M., Jacquelin Rankine, Alison J. Culyba, Kar-Hai Chu, and Kathleen M. Carley. 2021. “Best Practices for Modeling Egocentric Social Network Data and Health Outcomes.” HERD: Health Environments Research & Design Journal 14 (4): 18–34. https://doi.org/10.1177/19375867211013772.
Csardi, Gabor, and Tamas Nepusz. 2006. “The Igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. https://igraph.org.
Csárdi, Gábor, Tamás Nepusz, Vincent Traag, Szabolcs Horvát, Fabio Zanini, Daniel Noom, and Kirill Müller. 2023. igraph: Network Analysis and Visualization in r. https://doi.org/10.5281/zenodo.7682609.
Dowle, Matt, and Arun Srinivasan. 2023. Data.table: Extension of ‘Data.frame‘.
Fruchterman, Thomas M. J., and Edward M. Reingold. 1991. “Graph Drawing by Force-Directed Placement.” Software: Practice and Experience 21 (11): 1129–64. https://doi.org/10.1002/spe.4380211102.
Granovetter, Mark S. 1973. “The Strength of Weak Ties.” American Journal of Sociology, May. https://doi.org/10.1086/225469.
Grapov, Dmitry, and John W. Newman. 2012. imDEV: A Graphical User Interface to R Multivariate Analysis Tools in Microsoft Excel.” Bioinformatics 28 (17): 2288–90. https://doi.org/10.1093/bioinformatics/bts439.
Hansen, Derek L., Ben Shneiderman, and Marc A. Smith. 2011. “Chapter 7 - Clustering and Grouping.” In Analyzing Social Media Networks with NodeXL, edited by Derek L. Hansen, Ben Shneiderman, and Marc A. Smith, 93–102. Boston: Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-382229-1.00007-2.
Kamada, Tomihisa, and Satoru Kawai. 1989. AN ALGORITHM FOR DRAWING GENERAL UNDIRECTED GRAPHS.” INFORMATION PROCESSING LETTERS 31 (1).
Knisley, Debra J., and Jeff R. Knisley. 2014. “Seeing the Results of a Mutation with a Vertex Weighted Hierarchical Graph.” BMC Proceedings 8 (2): S7. https://doi.org/10.1186/1753-6561-8-S2-S7.
Lin, Lifeng. 2018. “Quantifying and Presenting Overall Evidence in Network Meta-Analysis.” Statistics in Medicine 37 (28): 4114–25. https://doi.org/10.1002/sim.7905.
Marsden, Peter V., and Betina Hollstein. 2023. “Advances and Innovations in Methods for Collecting Egocentric Network Data.” Social Science Research 109 (January): 102816. https://doi.org/10.1016/j.ssresearch.2022.102816.
Martin, Shawn, W. Michael Brown, and Brian N. Wylie. 2007. “Dr.L: Distributed Recursive (Graph) Layout.” dRl. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). https://doi.org/10.11578/dc.20210416.20.
Merton, Robert King. 1968. Social Theory and Social Structure. Simon; Schuster.
Ognyanova, Katherine. n.d. “Network Visualization with R.”
Sharma, Shalini, and Jerry Chou. 2022. “Accelerate Incremental TSP Algorithms on Time Evolving Graphs with Partitioning Methods.” Algorithms 15 (2): 64. https://doi.org/10.3390/a15020064.
Six, Janet M., and Ioannis G. Tollis. 1999. “A Framework for Circular Drawings of Networks.” In Graph Drawing, edited by Jan Kratochvíyl, 107–16. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. https://doi.org/10.1007/3-540-46648-7_11.
Tyner, Samantha C., François Briatte, and Heike Hofmann. 2017. “Network Visualization with Ggplot2.” The R Journal, May. https://hal.science/hal-01722543.
Vega Yon, George, and Porter Bischoff. 2023. Netplot: Beautiful Graph Drawing. https://github.com/USCCANA/netplot.
Wellman, Barry. 1979. “The Community Question: The Intimate Networks of East Yorkers.” American Journal of Sociology 84 (5): 1201–31. https://doi.org/10.1086/226906.
Zien, J. Y., M. D. F. Schlag, and P. K. Chan. 1999. “Multilevel Spectral Hypergraph Partitioning with Arbitrary Vertex Sizes.” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 18 (9): 1389–99. https://doi.org/10.1109/43.784130.